Chinese Verb Sense Discrimination Using an EM Clustering Model with Rich Linguistic Features

نویسندگان

  • Jinying Chen
  • Martha Palmer
چکیده

This paper discusses the application of the Expectation-Maximization (EM) clustering algorithm to the task of Chinese verb sense discrimination. The model utilized rich linguistic features that capture predicateargument structure information of the target verbs. A semantic taxonomy for Chinese nouns, which was built semi-automatically based on two electronic Chinese semantic dictionaries, was used to provide semantic features for the model. Purity and normalized mutual information were used to evaluate the clustering performance on 12 Chinese verbs. The experimental results show that the EM clustering model can learn sense or sense group distinctions for most of the verbs successfully. We further enhanced the model with certain fine-grained semantic categories called lexical sets. Our results indicate that these lexical sets improve the model’s performance for the three most challenging verbs chosen from the first set of experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Features of Verb Complements in Co-composition: A case study of Chinese baking verb using Weibo corpus

In the Generative Lexicon Theory (GLT), co-composition is one of the generative devices proposed to explain the cases of verbal polysemous behavior where more than one function application is allowed. The English baking verbs were used as one of the examples to illustrate how their complements co-specify the verb with qualia unification. In this paper, we begin by exploring the polysemy of Chin...

متن کامل

AFAST: An Automatic Frames Acquisition System

This paper describes an unsupervised strategy to acquire lexico-semantic frames (LSFs) of verbs from sentential parsed corpora (in syntactic level). The problems of acquiring LSFs consist of verb senses ambiguity, diversity of linguistic usages, and lack of completed frame slots in a single sentence. We propose an specific clustering technique based on the Minimum Description Length (MDL) princ...

متن کامل

Supervised Morphology Generation Using Parallel Corpus

Translating from English, a morphologically poor language, into morphologically rich languages such as Persian comes with many challenges. In this paper, we present an approach to rich morphology prediction using a parallel corpus. We focus on the verb conjugation as the most important and problematic phenomenon in the context of morphology in Persian. We define a set of linguistic features usi...

متن کامل

Verb Sense and Subcategorization: Using Joint Inference to Improve Performance on Complementary Task

We propose a general model for joint inference in correlated natural language processing tasks when fully annotated training data is not available, and apply this model to the dual tasks of word sense disambiguation and verb subcategorization frame determination. The model uses the EM algorithm to simultaneously complete partially annotated training sets and learn a generative probabilistic mod...

متن کامل

Aligning Features with Sense Distinction Dimensions

In this paper we present word sense disambiguation (WSD) experiments on ten highly polysemous verbs in Chinese, where significant performance improvements are achieved using rich linguistic features. Our system performs significantly better, and in some cases substantially better, than the baseline on all ten verbs. Our results also demonstrate that features extracted from the output of an auto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004